Time Variable Reinforcement Learning and Reinforcement Function Design
نویسندگان
چکیده
We introduce the mathematical model for time variable reinforcement learning. The policy, the rewards or reinforcement function and the transition probabilities may depend on the progress of the time t. We prove that under certain conditions slightly changed methods of classical dynamic programming assure finding the optimal policy. For that we deduct the Bellman equation for the time variable case and apply the fixed point theorem. Furthermore we present a particular flexible reinforcement function design frame with adjustable parameters and its theorical equivalence to general reinforcement functions. A parameter update algorithm (UPA) for the new reinforcement function in order to guarantee desired ratios of positive, negative and null rewards is introduced. In a series of real robot experiments we show that using the time variable reinforcement function introduced above may help to accelerate learning. Interesting results comparing the learning progress for wall following and an obstacle avoidance behavior implementing Q-learning and a radial basis function network are given. As a main result of our work we address the effectiveness of reinforcement function design and time variable reinforcement learning in general. CONTENTS About this work.......................................................................................................i Acknowledgement...................................................................................................i Abstract...................................................................................................................ii
منابع مشابه
Low-Area/Low-Power CMOS Op-Amps Design Based on Total Optimality Index Using Reinforcement Learning Approach
This paper presents the application of reinforcement learning in automatic analog IC design. In this work, the Multi-Objective approach by Learning Automata is evaluated for accommodating required functionalities and performance specifications considering optimal minimizing of MOSFETs area and power consumption for two famous CMOS op-amps. The results show the ability of the proposed method to ...
متن کاملRRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملDynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)
In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...
متن کاملMeta Reinforcement Learning with Latent Variable Gaussian Processes
Data efficiency, i.e., learning from small data sets, is critical in many practical applications where data collection is time consuming or expensive, e.g., robotics, animal experiments or drug design. Meta learning is one way to increase the data efficiency of learning algorithms by generalizing learned concepts from a set of training tasks to unseen, but related, tasks. Often, this relationsh...
متن کاملMultiple Model-Based Reinforcement Learning
We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple model-based reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The system is composed of multiple modules, each of which consists of a state predict...
متن کامل